108 research outputs found

    A Sparse-Modeling Based Approach for Class Specific Feature Selection

    Get PDF
    In this work, we propose a novel Feature Selection framework called Sparse-Modeling Based Approach for Class Specific Feature Selection (SMBA-CSFS), that simultaneously exploits the idea of Sparse Modeling and Class-Specific Feature Selection. Feature selection plays a key role in several fields (e.g., computational biology), making it possible to treat models with fewer variables which, in turn, are easier to explain, by providing valuable insights on the importance of their role, and likely speeding up the experimental validation. Unfortunately, also corroborated by the no free lunch theorems, none of the approaches in literature is the most apt to detect the optimal feature subset for building a final model, thus it still represents a challenge. The proposed feature selection procedure conceives a two-step approach: (a) a sparse modeling-based learning technique is first used to find the best subset of features, for each class of a training set; (b) the discovered feature subsets are then fed to a class-specific feature selection scheme, in order to assess the effectiveness of the selected features in classification tasks. To this end, an ensemble of classifiers is built, where each classifier is trained on its own feature subset discovered in the previous phase, and a proper decision rule is adopted to compute the ensemble responses. In order to evaluate the performance of the proposed method, extensive experiments have been performed on publicly available datasets, in particular belonging to the computational biology field where feature selection is indispensable: the acute lymphoblastic leukemia and acute myeloid leukemia, the human carcinomas, the human lung carcinomas, the diffuse large B-cell lymphoma, and the malignant glioma. SMBA-CSFS is able to identify/retrieve the most representative features that maximize the classification accuracy. With top 20 and 80 features, SMBA-CSFS exhibits a promising performance when compared to its competitors from literature, on all considered datasets, especially those with a higher number of features. Experiments show that the proposed approach may outperform the state-of-the-art methods when the number of features is high. For this reason, the introduced approach proposes itself for selection and classification of data with a large number of features and classes

    record linkage of banks and municipalities through multiple criteria and neural networks

    Get PDF
    Record linkage aims to identify records from multiple data sources that refer to the same entity of the real world. It is a well known data quality process studied since the second half of the last century, with an established pipeline and a rich literature of case studies mainly covering census, administrative or health domains. In this paper, a method to recognize matching records from real municipalities and banks through multiple similarity criteria and a Neural Network classifier is proposed: starting from a labeled subset of the available data, first several similarity measures are combined and weighted to build a feature vector, then a Multi-Layer Perceptron (MLP) network is trained and tested to find matching pairs. For validation, seven real datasets have been used (three from banks and four from municipalities), purposely chosen in the same geographical area to increase the probability of matches. The training only involved two municipalities, while testing involved all sources (municipalities vs. municipalities, banks vs banks and and municipalities vs. banks). The proposed method scored remarkable results in terms of both precision and recall, clearly outperforming threshold-based competitors

    Assessing the effects of Bt maize on the non-target pest Rhopalosiphum maidis by demographic and life-history measurement endpoints.

    Get PDF
    AbstractThe most commercialized Bt maize plants in Europe were transformed with genes which express a truncated form of the insecticidal delta-endotoxin (Cry1Ab) from the soil bacterium Bacillus thuringiensis (Bt) specifically against Lepidoptera. Studies on the effect of transgenic maize on non-target arthropods have mainly converged on beneficial insects. However, considering the worldwide extensive cultivation of Bt maize, an increased availability of information on their possible impact on non-target pests is also required. In this study, the impact of Bt-maize on the non-target corn leaf aphid, Rhopalosiphum maidis, was examined by comparing biological traits and demographic parameters of two generations of aphids reared on transgenic maize with those on untransformed near-isogenic plants. Furthermore, free and bound phenolics content on transgenic and near-isogenic plants were measured. Here we show an increased performance of the second generation of R. maidis on Bt-maize that could be attributable to indirect effects, such as the reduction of defense against pests due to unintended changes in plant characteristics caused by the insertion of the transgene. Indeed, the comparison of Bt-maize with its corresponding near-isogenic line strongly suggests that the transformation could have induced adverse effects on the biosynthesis and accumulation of free phenolic compounds. In conclusion, even though there is adequate evidence that aphids performed better on Bt-maize than on non-Bt plants, aphid economic damage has not been reported in commercial Bt corn fields in comparison to non-Bt corn fields. Nevertheless, Bt-maize plants can be more easily exploited by R. maidis, possibly due to a lower level of secondary metabolites present in their leaves. The recognition of this mechanism increases our knowledge concerning how insect-resistant genetically modified plants impact on non-target arthropods communities, including tritrophic web interactions, and can help support a sustainable use of genetically modified crops

    Selected papers from the 15th and 16th international conference on Computational Intelligence Methods for Bioinformatics and Biostatistics

    Get PDF
    Funding Information: CIBB 2019 was held at the Department of Human and Social Sciences of the University of Bergamo, Italy, from the 4th to the 6th of September 2019 []. The organization of this edition of CIBB was supported by the Department of Informatics, Systems and Communication of the University of Milano-Bicocca, Italy, and by the Institute of Biomedical Technologies of the National Research Council, Italy. Besides the papers focused on computational intelligence methods applied to open problems of bioinformatics and biostatistics, the works submitted to CIBB 2019 dealt with algebraic and computational methods to study RNA behaviour, intelligence methods for molecular characterization and dynamics in translational medicine, modeling and simulation methods for computational biology and systems medicine, and machine learning in healthcare informatics and medical biology. A supplement published in BMC Medical Informatics and Decision Making journal [] collected three revised and extended papers focused on the latter topic.publishersversionpublishe

    Modeling Green Peach Aphid populations exposed to elicitors inducing plant resistance on peach

    Get PDF
    Matrix Population Models (MPMs) are not commonly used to simulate arthropod population dynamics with applications to pest control assessment in agricultural context. However, an increasing body of studies are prompting the finding of optimization techniques to reduce uncertainty in matrix parameters estimation. Indeed, uncertainty in parameters estimates may lead to significant management implications. Here we present a case study where MPMs are used for assessing the efficacy of treatment with elicitors inducing plant resistance against pathogen, such as laminarin, for the control of the Green Peach Aphid (Myzus persicae Sulzer) populations on peach. Such demographic approach could be particularly suitable to study this kind of compounds, which are mainly characterized by causing sub-lethal effects rather than acute mortality. An artificially assembled system [1] was arranged since it is well suited to follow the fate and behavior of a population exposed to elicitors activating chemical defense in plant. The obtained data, consisting of population time series, were used to generate a stage-classified projection matrix. The general model used to simulate population dynamics consists of a matrix containing i) survival probabilities (the probability of growing and moving to the next stage and the probability of surviving and remaining in the same stage), and ii) fecundities of the population. Most of the used methods for estimating the parameter values of stage-classified models rely on following cohorts of identified individuals [2]. However, in this study the observed data consisted of a time-series of population vectors where individuals are not distinguished. The relationship between the observed data and the values of the matrix parameters that produced the series involves an estimation process called inverse problem. Since all demographic analyses rely on how much the estimated parameters of the matrix are able to represent population dynamics, a Genetic algorithm for inverse parameter estimation was used in order to find a better model fit for the observed stage class distributions. These results were compared to those obtained by the quadratic programming method [3] used for determining the set of parameters that minimizes the residual between the collected data and the model output. REFERENCES: 1. Macfadyen, S., Banks, J.E., Stark, J.D., Davies, A.P., 2014. Using semifield studies to examine the effects of pesticides on mobile terrestrial invertebrates. Annu. Rev. Entomol. 59, 383-404. 2. Caswell, H., 2001. Matrix population models, second ed. Sinauer Associates Inc., Massachusetts. 3. Wood, S.N., 1994. Obtaining birth and mortality patterns from structured population trajectories. Ecol. Monogr. 64, 23-44

    ensembles of probabilistic principal surfaces and competitive evolution on data two different approaches to data classification

    Get PDF
    Probabilistic Principal Surfaces (PPS) offer very powerful visualization and classification capabilities and overcome most of the shortcomings of other neural tools such as SOM, GTM, etc. More specifically PPS build a probability density function of a given data set of patterns lying in a D-dimensional space (with D ≫ 3) which can be expressed in terms of a limited number of latent variables laying in a Q-dimensional space (Q is usually 2-3) which can be used to visualize the data in the latent space. PPS may also be arranged in ensembles to tackle very complex classification tasks. Competitive Evolution on Data (CED) is instead an evolutionary system in which the possible solutions (cluster centroids) compete to conquer the largest possible number of resources (data) and thus partition the input data set in clusters. We discuss the application of Spherical-PPS to two data sets coming, respectively, from astronomy (Great Observatory Origins Deep Survey) and from genetics (microarray data from yeast genoma) and of CED to the genetics data only

    TESS discovery of a super-Earth and two sub-Neptunes orbiting the bright, nearby, Sun-like star HD 22946

    Full text link
    We report the Transiting Exoplanet Survey Satellite (TESS) discovery of a three-planet system around the bright Sun-like star HD~22946(V=8.3 mag),also known as TIC~100990000, located 63 parsecs away.The system was observed by TESS in Sectors 3, 4, 30 and 31 and two planet candidates, labelled TESS Objects of Interest (TOIs) 411.01 (planet cc) and 411.02 (planet bb), were identified on orbits of 9.57 and 4.04 days, respectively. In this work, we validate the two planets and recover an additional single transit-like signal in the light curve, which suggests the presence of a third transiting planet with a longer period of about 46 days.We assess the veracity of the TESS transit signals and use follow-up imaging and time series photometry to rule out false positive scenarios, including unresolved binary systems, nearby eclipsing binaries or background/foreground stars contaminating the light curves. Parallax measurements from Gaia EDR3, together with broad-band photometry and spectroscopic follow-up by TFOP allowed us to constrain the stellar parameters of TOI-411, including its radius of1.157±0.025R⊙1.157\pm0.025R_\odot. Adopting this value, we determined the radii for the three exoplanet candidates and found that planet bb is a super-Earth, with a radius of 1.72±0.10R⊕1.72\pm0.10R_\oplus, while planet cc and dd are sub-Neptunian planets, with radii of2.74±0.14R⊕2.74\pm0.14R_\oplus and 3.23±0.19R⊕3.23\pm0.19R_\oplus respectively. By using dynamical simulations, we assessed the stability of the system and evaluated the possibility of the presence of other undetected, non-transiting planets by investigating its dynamical packing. We find that the system is dynamically stable and potentially unpacked, with enough space to host at least one more planet between cc and dd.(Abridged)Comment: 21 pages, 12 figures. Accepted for publication on A&
    • …
    corecore